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Abstract — Suppose that Alice wishes to send messages to Bob 
through a communication channel C\, but her transmissions also 
reach an eavesdropper Eve through another channel C2. This is 
the wiretap channel model introduced by Wyner in 1975. The goal 
is to design a coding scheme that makes it possible for Alice to 
communicate both reliably and securely. Reliability is measured 
in terms of Bob's probability of error in recovering the message, 
while security is measured in terms of the mutual information be- 
tween the message and Eve's observations. Wyner showed that 
the situation is characterized by a single constant C s , called the 
secrecy capacity, which has the following meaning: for all e > 0, 
there exist coding schemes of rate R ^ C s — e that asymptotically 
achieve both the reliability and the security objectives. However, 
his proof of this result is based upon a nonconstructive random- 
coding argument. To date, despite a considerable research effort, 
the only case where we know how to construct coding schemes that 
achieve secrecy capacity is when Eve's channel C2 is an erasure 
channel, or a combinatorial variation thereof. 

Polar codes were recently invented by Ankan; they approach 
the capacity of symmetric binary-input discrete memoryless chan- 
nels with low encoding and decoding complexity. In this paper, we 
use polar codes to construct a coding scheme that achieves the se- 
crecy capacity for a wide range of wiretap channels. Our construc- 
tion works for any instantiation of the wiretap channel model, as 
long as both C\ and C2 are symmetric and binary-input, and Ci is 
degraded with respect to C\. Moreover, we show how to modify our 
construction in order to provide strong security, in the sense de- 
fined by Maurer, while still operating at a rate that approaches the 
secrecy capacity. In this case, we cannot guarantee that the relia- 
bility condition will also be satisfied unless the main channel C\ is 
noiseless, although we believe it can be always satisfied in practice. 

Index Terms — channel polarization, information-theoretic secu- 
rity, polar codes, secrecy capacity, strong security, wiretap channel 



I, Introduction 

THE notion of wiretap channels was introduced by Aaron 
Wyner [42] in 1975. In this setting, Alice wishes to send 
messages to Bob through a communication channel C\, called 
the main channel, but her transmissions also reach an adversary 
Eve through another channel C2, called the wiretap channel. 
This is illustrated in Figure 1, wherein U denotes a fc-bit mes- 
sage that Alice wishes to communicate to Bob. We think of U 
as a random variable that takes values in {0, 1} ; unlike most 
papers on wiretap channels, we do not assume anything regard- 
ing the a priori distribution of II. While making use of auxiliary 
random bits, the encoder maps XI into a sequence X of n chan- 
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Figure 1. Block diagram of a generic wiretap-channel system 

nel symbols. This sequence is transmitted across the main chan- 
nel and the wiretap channel resulting in the corresponding chan- 
nel outputs Y and Z. Finally, the decoder maps Y (deterministi- 
cally) into an estimate U of the original message. 

The goal is to design a coding scheme — namely, an encod- 
ing algorithm and a decoding algorithm — that makes it possi- 
ble to communicate both reliably and securely, as the message 
length k tends to infinity. Reliability is measured in terms of the 
probability of error in recovering the message. Specifically, the 
objective is to satisfy the following 



Reliability Condition: lim Pi{U ^ U} = 



(1) 



where the probability is over all the relevant coin tosses in the 
system: in the generation of U, in the encoder, and in the main 
channel. Security is usually measured in terms of the normal- 
ized mutual information between the message U and Eve's 
observations Z. Specifically, one is interested in encoding algo- 
rithms that satisfy the following 

I(U;Z) 



Security Condition: lim 







(2) 



J:->-oo k 

Note that I(U; Z) is equal to the difference between the a priori 
entropy H(U) and the conditional entropy H(IZ|Z). Thus, intu- 
itively, $2& means that observing Z does not provide much infor- 
mation about U beyond what is available a priori, as compared 
to the message length k. Maurer argued in [28,29] that the con- 
ventional notion of security (f2]i is much too weak. Indeed, it is 
easy to construct examples where fc 1 " 6 out of the k message bits 
are disclosed to Eve, while still satisfying ((2). This is clearly un- 
acceptable. Thus Maurer introduced in [28] an alternative 



Strong Security Condition: lim I (If; Z) = 



(3) 



Notice that both security conditions (f2]) and (f3]l are information- 
theoretic rather than computational: the adversary is assumed to 
be computationally unbounded, and security does not depend 
on computational hardness assumptions of any kind. 
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A. Prior Work 

In 1975, Wyner [42] considered a special case of the system in 
Figure 1 where both Q and Ci are discrete memoryless chan- 
nels (DMCs) and, moreover, Ci is degraded with respect to C\. 
He proved that such a system is characterized by a single con- 
stant C s , called the secrecy capacity, which has the following 
meaning. For all £ > 0, there exist coding schemes of informa- 
tion rate R^ C s — £ that satisfy (fTJ and ©; conversely, it is not 
possible to satisfy both (Q]i and (O at rates greater than C s . Since 
1975, Wyner's results have been extended to a variety of con- 
texts, most notably Gaussian channels [23], general broadcast 
channels with confidential messages [12], and channels that im- 
pose a combinatorial (rather than probabilistic) constraint on 
the adversary [8,33]. In fact, the literature on wiretap channels 
encompasses, by now, hundreds of papers. 

However, the vast majority of this work relies on noncon- 
structive random-coding arguments to establish the main re- 
sults. Such results show that there exist codes that achieve se- 
crecy capacity, but are of little use if one's goal is to design 
specific polynomial-time encoding/decoding algorithms. To the 
best of our knowledge, constructive solutions to the wiretap- 
channel problem are available only in two special cases. The 
first special case is when the main channel is noiseless and the 
wiretap channel is the binary erasure channel (BEC). A coding 
scheme for this case, using LDPC codes for the BEC, was pre- 
sented in [37, 39] and proved to achieve secrecy capacity. The 
other special case is when the adversary is constrained combi- 
natorially: Eve can select to observe some t out of the n trans- 
mitted symbols, while the remaining n — t symbols are erased. 
This situation, studied by Ozarow and Wyner in [33], may be 
regarded as a combinatorial variation of an erasure channel. 
Provably optimal coding schemes for this case can be construc- 
ted from MDS codes [41], or using extractors [8]. We observe, 
however, that even for the simple situation where C\ is noise- 
less and C2 is a binary symmetric channel, it is not known how 
to explicitly construct codes that achieve secrecy capacity. 

We point out that a general method of coding for the wiretap 
channel, often referred to as coset-coding or syndrome-coding, 
is well known. This method goes back to the work of Wyner [33, 
42], although it was significantly extended and generalized in 
[9, 10] and other papers. Assume, for simplicity, that the input 
alphabet of both Q and Ci is binary. In this case, the coset- 
coding method utilizes two binary linear codes: an "outer" code 
C* and an "inner" code C, such that C C C* and the difference 
dim(C*) — dim(C) between their dimensions is k. This con- 
dition implies that C* can be partitioned into 2 cosets of C. 
A message u G {0, 1} is conveyed by Alice via the choice of 
one of these 2 cosets, say a + C. What is transmitted by Alice 
is a vector X that is selected uniformly at random from a + C. 
Loosely speaking, the outer code C* serves to correct the er- 
rors on the main channel, and thus ensures reliability, while the 
inner code C, over which X is randomized, ensures security. 
The trouble is that it is not known how to explicitly construct 
a sequence of outer codes C* and inner codes C that satisfy 
conditions (HJ and (flj at a rate that approaches the secrecy ca- 
pacity as n — > 00. The work of Cohen and Zemor [9, 10] shows 
that a random choice of the inner code C suffices to achieve 



strong security, in a very general setting. Notably, the proof of 
this result in [9, 10] does not assume that the messages are uni- 
formly random a priori. Still, to the best of our knowledge, the 
only cases where explicit constructions of C and C* are known 
are those described in the foregoing paragraph (cf. [37,39]). 

B. Our Contributions 

In this paper, we present a coding scheme that achieves the se- 
crecy capacity of wiretap channels whenever Q and Ci are 
binary-input symmetric DMCs and C2 is degraded with respect 
to C\. This is the situation originally studied by Wyner; it in- 
cludes the important special case where C\ and C2 are arbitrary 
binary symmetric channels. We are able to satisfy the reliability 
and security conditions ([TJ and (f2]i with explicit polynomial- 
time encoding/decoding algorithms. In fact, the number of op- 
erations required for encoding and decoding is only O ( n log n ) . 
Our construction is based upon key results in the literature on 
polar codes, recently invented by Ankan [3]. 

It is proved in [3] that polar codes achieve the capacity of ar- 
bitrary binary-input symmetric DMCs, with low encoding and 
decoding complexity. The proof of this result is based on a phe- 
nomenon called channel polarization. Let 



1 

1 1 



(4) 



and let G® m denote the m-th Kronecker power of G. Let W be 
a symmetric binary-input DMC, and let V = ( V\, V%, . . . , V n ) 
be a block of n = 2 m bits chosen uniformly at random from 
{0, 1}". Suppose V is encoded as X = VP n G® m , where P„ is 
the n x n bit-reversal permutation matrix. Finally, X is trans- 
mitted through n independent copies of W, as shown below: 



V n - 




■Yn 



(5) 



Ankan [3] considered the n channels "seen" by each of the n in- 
dividual bits V\, V2, . . . , V n as they undergo the transformation 
in (0. Let us call them the bit-channels — for a precise defini- 
tion of the notion of a bit-channel, see [3] and SectionHIIl It is 
shown in [3] that as m grows, the bit-channels start polarizing: 
they approach either a noiseless channel or a pure-noise chan- 
nel. We will say that the former bit-channels are good while the 
latter are bad (again, see SectionlHll for a rigorous definition). 
One of the key results of [3] is that the fraction of bit-channels 
that are good approaches the capacity of W as n — »■ 00. 

Given the channel polarization phenomenon, the general idea 
of our construction is quite simple. We will transmit random 
bits over those bit-channels that are good for both Eve and Bob, 
information bits over those bit-channels that are good for Bob 
but bad for Eve, and zeros over those bit-channels that are bad 
for both Bob and Eve. In the rest of this paper, we make this 
idea precise and prove that it works. 
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In Section|IIl we briefly recap relevant results from the litera- 
ture on wiretap channels, in order to obtain a simple expression 
for the secrecy capacity C s in the case where Q and C^ are sym- 
metric DMCs and Ci is degraded with respect to C\. In Sec- 
tion[Iil] we provide the necessary background on polar codes 
and establish a certain property of channel polarization that is 
crucial for our construction (Lemma|4]l. The construction itself, 
namely the proposed coding scheme, is presented in SectionllVl 
In Section|V] we prove that the proposed coding scheme satis- 
fies the reliability and security conditions (Q]i and (|2). We also 
show in Sec tionlVl that the rate k/n of our coding scheme ap- 
proaches the secrecy capacity C s as n — > oo. 

In Section lVll we consider the stronger notion of security (0). 
It was shown by Maurer-Wolf [29] that any coding scheme that 
satisfies the "weak" security condition (ffji can be converted into 
a coding scheme that satisfies the stronger condition ([3), This 
is accomplished using an ingenious information reconciliation 
and privacy amplification protocol [7]. Although, in principle, 
the rate overhead necessary for privacy amplification can be 
made arbitrarily small, this is unlikely to be the case in practice. 
In Section lVTl we show how to modify the coding scheme of 
SectionlPVlin order to guarantee strong security directly, with- 
out the need for privacy amplification. This is achieved by suit- 
ably modifying our definition of "bad" bit-channels, in a man- 
ner that differs from the generally accepted notions [3]. As a re- 
sult, under the modified definition, a vanishing fraction of bit- 
channels could be good for Eve but bad for Bob, even when the 
wiretap channel is degraded with respect to the main channel. 
In this situation, the rate of the coding scheme of SectionlVll 
still approaches the secrecy capacity C s , but we cannot guaran- 
tee that the reliability condition (fTJ is satisfied, unless the main 
channel is noiseless. Nevertheless, we believe that, in practice, 
acceptably low probabilities of error could be achieved on the 
main channel (using a more elaborate decoding algorithm). 

We conclude the paper in Section lVTIl with a brief discussion 
of further results. In particular, we explain in Section lVHl that our 
construction generalizes straightforwardly to the case where the 
channels C\ and C2 are not symmetric, although in this case po- 
lar codes become less explicit and only the "symmetric secrecy 
capacity" can be achieved. A few open problems that stem from 
our results herein are also discussed in Section lvni 

C. Related Work 

Following the publication of a preliminary version of this paper 
in [25] and [26], several related papers have appeared [1, 17,21]. 
Most notably, the work of Hof and Shamai [17] on polar cod- 
ing for wiretap channels is independent and contemporaneous 
to ours. While some of the main results in [17] and in this paper 
are similar, there are important differences that we would like 
to emphasize. One key difference is that Hof and Shamai [17] 
analyze their polar-coding scheme in detail, including recursive 
channel combining and splitting, whereas we treat polar codes 
essentially as a black box. We believe this makes our proof both 
shorter and clearer. 

There are also significant differences between the results es- 
tablished in [ 1 , 1 7, 2 1 ] and in this paper. In particular, it is shown 
in [1, 17] that polar coding achieves the entire rate-equivocation 



region (see [24] for a definition), whereas we are interested only 
in the extreme point of this region that corresponds to secrecy 
capacity. On the other hand, in several other respects, our results 
are stronger than those of [1, 17,21]. First, the proof in [1, 17] is 
contingent on the assumption that the message If is a priori uni- 
form over {0, 1} , whereas we do not place any constraints on 
the a priori distribution of U. Assuming that messages are a pri- 
ori uniform is common in information theory, but such assump- 
tions are completely unacceptable in cryptography [6, 15]. Even 
more importantly, we show how polar coding should be used to 
provide strong security, whereas the work of [1,17,21] pro- 
vides weak security only. Again, in cryptographic applications, 
conventional weak security is usually unacceptable. 

II. Secrecy Capacity 

In this section, we first establish some relevant terminology. We 
then briefly recap the results of [12,22] to provide a simple ex- 
pression for the secrecy capacity C s in the case where Q and Ci 
are symmetric DMCs and C2 is degraded with respect to Ci. 

We will limit our consideration to finite-input and finite-out- 
put discrete memoryless channels throughout. Such a channel 
is a triple ( S£ , W , W), where 3>, <3f are finite sets and W is 
an | SE\ x \3/\ matrix with W[x,l/] being the probability of re- 
ceiving i/e^ given that x E 3£ was sent. We will follow the 
convention of [3,4, 19] and write W(y\x) instead of W[x,i/]. 

A matrix M is strongly symmetric if the rows of M are per- 
mutations of each other and the columns of M are permuta- 
tions of each other. A channel ( 3£ ', W , W) is strongly symmet- 
ric if W is a strongly symmetric matrix. Following [3,4, 14, 19], 
we will say that ( 3C ' , & ', W) is symmetric (often called output- 
symmetric) if the columns of W can be partitioned into subsets 
such that each subset forms a strongly symmetric matrix. The 
capacity of a symmetric channel ( 2£ , W , W) is given by 

C{W) = f H(X) - H(X|Y) = log 2 | X\ - H(X|Y) (6) 

where the random variable X at the input to the channel is uni- 
form over 3C, and Y is the corresponding random variable at 
the channel output (for a proof of this fact, see [14, p. 94]). An 
important example of a symmetric channel is the binary sym- 
metric channel BSC(p) = ({0, 1}, {0, 1}, W) with 

1-p p 

V i-p 

Given a channel C\ = ( SC ', W,W\), we say that another chan- 
nel C2 = ( 36 ' , 2£ ', W2) is degraded with respect to C\ if there 
exists a third channel C3 = (&, 3f, W3} such that C2 is the 
cascade of C\ and C3. Specifically, it is required that 

W 2 (z|x) = £Wi(y|x)W 3 (z|y) (7) 

for all x 6 S* and z£ J. Note that whenever p2^ p\, the chan- 
nel C2 = BSC(p2) is degraded with respect to C\ = BSC(pi). 
The secrecy capacity C s of the wiretap-channel system in Fig- 
ure 1 is defined as follows. First, assume that the message U is 
uniformly random over {0, 1} . Then C s is the supremum over 
all rates R = k/n (in bits per channel use) such that there exist 
coding schemes of rate R satisfying conditions (fl} and (0. For 
the general case where C\ and C2 are arbitrary DMCs, comput- 
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ing the secrecy capacity is a difficult problem. Let X denote the 
single-letter input to Q and C2, let Y and Z denote the corre- 
sponding single-letter outputs. The best known expression for 
the secrecy capacity C s , given by Csiszar and Korner in [12], is 



C s = max(i(!i;Y)-i(!i;Z) 



where the maximum is taken over all random variables Li such 
that II — > X — > (Y, Z) is a Markov chain. The problem is that 
this maximization is often difficult to evaluate, and there is no 
simpler expression for the secrecy capacity even when C\ and Ci 
are both strongly symmetric, unless additional constraints are 
satisfied. See [24,40] for more details on this. 

However, when C 1 = (^, W, W* } and C 2 = ( 3£ , 2?, W) are 
symmetric and Gi is degraded with respect to C\, a simple ex- 
pression for C s was given by Leung- Yan-Cheong in [22]. It is 
shown in [22, Theorem 4] that in this case 

C s = C(W*)-C(W) = H(X\Z) - H(X\Y) (8) 

where X is uniform over X. In particular, if the main channel is 
BSC(pi) while the wiretap channel is BSC(p2)> with Vi ^ Pl> 
then the secrecy capacity is given by hi(j>2) — ^2(Pi)> where 
h-i (•) is the binary entropy function. 

III. Polar Codes 

This section provides a concise overview of the groundbreak- 
ing work of Ankan [3] and others [4, 19,20] on polar codes and 
channel polarization. We establish only those results that are es- 
sential for the coding schemes presented in this paper. 

As in [16,20], we consider exclusively binary-input symmet- 
ric memoryless (BSM) discrete channels. Such a channel is 
a symmetric DMC, as defined in the previous section, with in- 
put alphabet ^T = {0, 1}. With a slight abuse of notation, we 
will often follow [3,4,20] and simply write W to denote a BSM 
channel ( {0, 1 }, W, W) . The Bhattacharyya parameter of W is 

def 



Z(W) 



E 



W(y|0)W(y|l) 



It can be shown that Z(W) always takes values in [0, 1]. Intu- 
itively, channels with Z(W) ^ £ are almost noiseless, while 
channels with Z( W) ^ 1 — £ are almost pure-noise channels. 
This intuition is made precise in [3, Proposition 1]. 

Ankan [3] introduces a number of channels that are associ- 
ated with the transformation in ©. First, there is the channel 
({0,l} n ,^ n ,W n ) given by 

W n (y\x) = flW(yi\xi) (9) 

1=1 

where x = (x\,xi, . . . ,x n ) andy = (1/1,1/2/ • • •/]/«)■ This is the 
channel that results from n independent uses of the channel W. 
Next, for all n = 2 m , let us define the Ankan transform matrix 
G„ d = P n G® m , where G is the matrix in © and P n is the bit-re- 
versal permutation matrix defined in [3, Section VII-B]. Ankan 
then introduces the "combined" channel ( {0, 1}", < 3/ n , W) with 
transition probabilities given by 



def 



(10) 



W{y\v) Q = W n (y\vG„) = W n (y\vP n G® m ) 

This is the channel seen by the random vector (Vi, V2, ■ ■ ■ , V n ) 
as it undergoes the transformation in (0. Ankan [3] also defines 



the channel ( {0, 1}, <& n x {0, l}'" 1 , W,) that is seen by the z'-th 
bit Vi, for i = 1, 2, . . . , n, as follows. Let V{ = (y\, Vi,..., V\) 
denote a binary vector of length z, with the convention that Vq 
is the empty string and that {0, 1}° = {vq}. Then 



Wi(y,Vi-l\Vi) 



def 



yn- 



L w 



ve{0,l}"- 



y\(vi-i,Vi,v)) (ii) 



where (■, ■) denotes vector concatenation. It is easy to show (cf. 
LemmaTBb that Wj (y, z;,_i | c,) is indeed the probability of the 
event that (Yi, Y 2 , . . . , Y„) = y and (V\, V 2 , . . . , Vj-_i) = »,-_i 
given the event V, = Vj, provided V = (Vi,V2, .. -,V n ) is a pri- 
ori uniform over {0, 1}". Consequently, if one considers a "hy- 
pothetical decoder" that attempts to estimate the z'-th bit Vj hav- 
ing observed y and f;_i, then W,- is the effective channel seen 
by such decoder (again, provided V is a priori uniform). We 
will refer to W; as the i-th bit-channel, for z — 1, 2, . . . , n. 

Observe that the optimal decision rule for the hypothetical 
decoder of the foregoing paragraph is trivial: decide V{ — if 



Wi(y,Vi_i\0) > Wi(y,»<_i|l) 



(12) 



and Vi = 1 otherwise. One can invoke this decision rule iterati- 
vely for all i = 1,2, . . .,n, while substituting the first i — 1 de- 
cisions (v\, Vi, ■ ■ ■ , Vi-i} in place of the hypothetical observa- 
tions Vi—\. Up to a small modification described later, this is the 
successive cancellation decoder invented by Ankan [3]. 

Following [4, 19], let us partition the n bit-channels into goo*/ 
channels and bad channels as follows. Let [n] = {1,2, ... ,n} 
and let f> < V 2 be a fixed positive constant. Then the index sets 
of the good and bad channels are given by 

g„(W,fi) d ^ f { i e [n] : Z(W,) < 2-"%} (13) 

B n (W,fi) d ^ f { i e [n] : Z(W,) ^ 2~"%} (14) 

One of the key results of [3,4] is that the fraction of the good 
channels approaches the channel capacity C(W), given by (O, 
as n — »■ 00. We state this result precisely as follows. 

Theorem 1. For any BSM channel W and any constant f> < V 2 
we have 

lim \9niym = c(w) 

n->oo n 

Theorem[T]readily leads to a construction of capacity-achie- 
ving polar codes. The general idea is to transmit the informa- 
tion bits over the good bit-channels while fixing the input to the 
bad bit-channels to a priori known values, say zeroiU Formally, 
given a vector v of length n and a set A C [n], let v_a denote the 
projection of v on the coordinates in A. Each subset A of [n] of 
size \A\ = k specifies apolarcode C n (A) of rate k/n. We de- 
fine C n (A) via its encoder map £ : {0, l} k —> {0, 1}". Given 
a message u 6 {0, 1} , the encoder proceeds in two steps. First, 
the encoder constructs the vector v E {0, 1}", by setting vj± = u 
and Vj¥ = 0, where A c is the complement of A in [n] and is 

'in the work of [3,4], which deals with symmetric as well as non-symmetric 
DMCs, it is important to allow an arbitrary choice of these "frozen" values. 
However, it is also shown in [3] that in the case of symmetric channels, any 
choice is as good as any other. Since we are concerned exclusively with sym- 
metric channels in this paper, we will use zeros for notational convenience. 
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the all-zero vector. Next, it outputs £{u) = vG n as in Q. The 
decoder we will use for C n (A) is the successive cancellation 
decoder of Arikan [3], This decoder works as already described 
in ([T2l i. with one straightforward modification: for i E A c , the 
decision rule is simply v\ — 0. 

The key property of the encoder-decoder pair of the forego- 
ing paragraph is summarized in the following theorem. This the- 
orem is (the second part of) Proposition 2 of Arikan [3]. 

Theorem 2. Let W be a BSM channel and let A be an arbit- 
rary subset of [n] of size \A\ = k. Suppose that a message U is 
chosen uniformly at random from {0, 1} , encoded as a code- 
word ofC„(A), and transmitted over W. Then the probability 
that the channel output is not decoded to U under successive 
cancellation decoding satisfies 



Pr{Q^U} < £Z(W f ) 

ieA 



In this paper, we need a result that is somewhat stronger than 
TheoremlU since we do not assume that messages are chosen 
uniformly at random from {0, 1} . Fortunately, such a result can 
be readily established using the machinery that was already de- 
veloped by Arikan [3] for symmetric channels. Indeed, for sym- 
metric channels, it is well known that the error probability is in- 
dependent of the transmitted codeword; hence, the input distri- 
bution should not matter. The following proposition makes this 
observation precise. We include a proof, for completeness. 

Proposition3. Let W be a BSM channel and let A be an arbit- 
rary subset of [n] of size \A\ = k. Suppose that a message U is 
chosen according to an arbitrary distribution from {0, 1} , en- 
coded as a codeword of C n (A), and transmitted over W. Then 
the probability P e that the channel output is not decoded to U 
under successive cancellation decoding satisfies 



p e < £z(w,; 

ieA 



(16) 



Proof. Following Arikan [3, Section V], we consider the sam 
pie space Cl n = {0, 1}" x < 3^ n with the probability measure 

<tei r,-n 



Ft{(v,y)} a ^2- n W(y\v) 



(17) 



for all v E {0, 1}" and y E W n . On this probability space, Ari- 
kan [3] defines the event S of block error as the set of all pairs 
(v, y) in fl„ such that the channel output y is not decoded to t>a 
under successive cancellation decoding. Let us further define, 
for all w E {0, 1}", the event 

„,,- def 



<(v,y)eCl n : v — w > 



(18) 



It is shown in [3, Proposition 2] that under the probability mea- 
sure in (fTTT i we have 

Pr{<f} < £Z(W,) < 19 > 

ieA 
It is furthermore shown in [3, Section VI-B] that, provided ties 
in the decision rule ( fT2l are broken at random, the events S and 
y w are independent for all w. In other words, 

Yr{S\%o} = Pr{<?} for all wE {0,1}" (20) 

Now consider the situation where the a priori distribution on the 
messages in {0, 1} is a delta-function. That is, a specific mes- 



sage u E {0,1} is always chosen with probability 1, and the 
input to the transformation in (|5]l is the vector w with wa = u 
and w_a c = 0. Let P e (u) denote the probability that the succes- 
sive cancellation decoder does not decode the corresponding 
channel output y to 104. Then it follows from (TTOb that 

P e (u) = £W(y|w) (21) 

yeF(w A ) 

where J-{wa) is the set of channel outputs that are not decoded 
to wa by the successive cancellation decoder. Observe that 



sny w 



< (v,y) € D„ : v = zv and y E F{wj) > 



Therefore, the probability of the event S D ~fw, under the prob- 
ability measure in ( fTTI i. can be expressed as 



(15) Pr{<rn%,} 



= £2- n W(y\v) = 2-"£W(y\w) (22) 
yeSr\r w yeF{io A ) 

It can be readily seen from ( fT8l that Pr { y w } = 2~ n for all w. 
Hence, it follows from (|20l that 

Pr{<fnr ro } = Pr{£\V w }Pr{r w } = 2-"Pr{ ( ?} (23) 

Combining (flB, <E3, O, we conclude that P e (u) = Pr{<£"}. 
Since this holds for all u 6 {0, 1} , we have 

Pe = J^Pe{u)Pr{U = u} = Pr{^} 

ue{0,l\" 

for any probability distribution Pr{ U = «} on {0, 1} . Hence, 
the proposition now follows from $1% . | 

In order to establish our main result in Section[VJ we also 
need to consider a slightly different encoding and decoding sce- 
nario. In this variation, the encoder £' is no longer determinis- 
tic. Rather, it has access to random bits and selects V^ at ran- 
dom, according to some fixed (but otherwise arbitrary) proba- 
bility distribution on {0, 1}' . In all other respects, £' is iden- 
tical to the encoder £ for C n (A) described above. The specific 
realization va c of V41 is revealed to the decoder by a genie. The 
decoder uses successive cancellation, with the suitable modifi- 
cation: for i E A c , the value of V{ is not set to zero, but rather 
to the corresponding coordinate in the realization va c (revealed 
by the genie). Let P' e denote the probability of block error in 
this scenario. It is shown in [3, Section VI] that Theorem^] still 
applies in this case, namely 



P'e < E Z W) 

ieA 



(24) 



The coding scheme presented in the next section relies cru- 
cially on one more result from the literature on polar codes. The 
following lemma was proved by Korada in [19, Lemma 4.7]. 

Lemma 4. Let W and W* be BSM channels such that W is de- 
graded with respect to W*. For n = 2 m , let W\, W2, ■ • • , W« 
and W?, Wf, . . . , W* denote the n corresponding bit-channels. 
Then W, is degraded with respect to W* for all i = 1, 2, . . . , n, 
and therefore C(W t ) < C(W*) and Z(W,) ^ Z(W*). 

It follows immediately from Lemma|4]and ( fT3l that if W is de- 
graded with respect to W* , then the set of good bit-channels for 
W is a subset of the set of good bit-channels for W*. More pre- 
cisely, we have Q n ( W, /3) C Q n (W* jS) for all constants f>. 
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IV. The Coding Scheme 

We consider a special case of the wiretap-channel system of 
Figure 1, wherein both the main channel C\ = ({0, 1}, W, W*) 
and Eve's wiretap channel Gi = ({0, 1}, 5°, W) are symmetric 
DMCs, and G 2 is degraded with respect to G\. The proposed 
coding scheme is illustrated informally below: 



l 
random 

bits 


7 


bit-channels good for 
both Bob and Eve 


Xi 


x 2 


\ 




i 

information 
bits 


1 
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i 
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bit-channels bad for 
both Bob and Eve 


x n 


\ 


1 





(25) 



row-permuted version of G® m 

The general idea is to transmit information only over those bit- 
channels that are bad for Eve, while flooding those bit channels 
that are good for Eve with random bits. Formally, we fix a pos- 
itive constant /5 < V 2 and define three subsets of [n] as follows: 



ft= f Q„(W,p) 

A d =g n (w*,p)\g n (w,p) 

B = f B„(W*,p) 



(26) 
(27) 
(28) 



Notice that the sets 11, A, B are disjoint and TZ U A U B = [n]. 
This is so since Q„ ( W*, fi) and B n (W* /3) are complements of 
each other by definition, and Q n ( W, fi) C Q n (W*, /S) by Lem- 
ma|4] Let \TZ\ — r and \A\ = k. We are now ready to describe 
the proposed encoding and decoding algorithms. 

Encoding Algorithm: Formally, the encoder is a function 
£ : {0, l} fc x{0, l} r — S- {0,1}". It accepts as input a mes- 
sage u E {0, 1} and a vector e E {0, l} r . We make no as- 
sumptions about u at this point, but we assume that e is 
selected by Alice uniformly at random from {0, 1}'. The 
encoder first constructs the vector v E {0, 1}", by setting 
v-ji = e, va = m, and vg = 0. The encoder then outputs 
£(u,e) := vG n = vP n G m ' as in ©. 
Decoding Algorithm: Formally, the decoder is a function 
V : W n — S- {0, l} k . It accepts as input a vector y E <¥ n at 
the output of the main channel d = ({0, 1}, <¥, W*). It 
then invokes successive cancellation decoding for the po- 
lar code C n (A U 1Z), used over W 7 *, to produce the vec- 
tor v E {0, 1}". The decoder outputs T>(y) :— v^. 

We defer the proof that this coding scheme satisfies the reliabil- 
ity and security conditions |[T), © to the next section. The rest 
of this section is devoted to two remarks about our construction. 

Remark. We point out that our encoding algorithm can be re- 
garded as a special case of the coset-coding method described in 
Section lLAl Recall that the coset-coding scheme is based upon 
an outer code C* that provides error-correction on the main 
channel and an inner code C C C* that ensures security for the 
wiretap channel. In our encoding algorithm, the outer code is 
C* = C„ (A U 11) and the inner code is C = C„ (ft). \j 



Remark. Given the channel polarization phenomenon, it's intu- 
itively clear from ( T25l l why the proposed coding scheme should 
work. The information bits U — (U\, U2, ■ ■ ■ , ii/ c ) reach Bob 
via good (almost noiseless) bit-channels. Thus Bob should be 
able to reconstruct them with very high probability. On the other 
hand, these same bits pass through bad (almost pure-noise) bit- 
channels on their way to Eve. Thus Eve should not be able to de- 
duce much information about U from her observations Z, and 
H(U\Z) should be close to H(U). 

However, this simple intuition is misleading, because it does 
not show how the random bits in dZST l help keep Eve ignorant. It 
may appear that this randomness is not really needed. For exam- 
ple, what would happen if the vector c that serves as the second 
input to our encoder function £(■,■) is not chosen at random 
from {0, l} r but rather set to an a priori fixed value? Since the 
channels are symmetric, any fixed value is as good as any other, 
so we may as well assume e = 0. This does not seem to affect 
the argument in the foregoing paragraph and, according to this 
argument, H(U\Z) would still be close to H(U). 

In fact, this is not true. The reason is that channels seen by 
individual input bits as they undergo the transformation in © 
depend on the distribution of other input bits. Specifically, if 
e = or e is fixed, the resulting encoder will not be secure. This 
is an important point that we would like to establish rigorously. 
To do so, we first need the following simple lemma. 

Lemma 5. Let S be an arbitrary subset of [n] of size k, and sup- 
pose that the polar code C n (S) is used to communicate over 
a BSM channel ({0, 1}, 3f, W). Further, assume that the mes- 
sage U at the input to the encoder for C n (S) is uniformly ran- 
dom over {0, 1} , and let Z denote the random vector at the 
channel output. Then I(U;Z)^kC(W). 

Proof. In fact, the lemma is true not only for C n (S) but for 
any binary linear code of dimension k. The only property of the 
transform matrix G n that we need is that it is nonsingular. 

Let V be the random vector obtained by setting Vs = U and 
Vs c = 0. Then the codeword transmitted over the channel is 



X = VG„ = UM 



(29) 



where M is a k x n row submatrix of G n . Since G„ is nonsin- 
gular, rank(M) = k and there exists a subset T of [n] of size k 
such that the corresponding k columns of M are linearly inde- 
pendent. This implies that there is a one-to-one correspondence 
between U andX^. Hence, I(U;Z) = I(Xj",Z). Furthermore, 
since the random vector U is uniform over {0,1} , so is the 
vector X-j-. Equivalently, its components {X, : i E T} are i.i.d. 
Ber(V 2 ) random variables, and we can further conclude that 

I(X r ;Z) ^ I{X T ;Z r ) = E^Z,) = kC(W) 

ieT 
where the last two equalities follow from the fact that the chan- 
nel ({0, 1}, 3f, W) is memoryless and symmetric. | 

Now suppose that the input to our encoder function £(■,■) is 
a message chosen uniformly at random from {0, 1} along with 
e — 0. This is a special case of the situation considered in Lem- 
maH with the set S given by d27}. Hence I(U;Z) ^ kC(W), 
and security condition (ffjl cannot be satisfied: a significant frac- 
tion of message bits, at least C(W), is potentially exposed, rj 
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We note that the foregoing remark illustrates a general result. 
It is known that © cannot be satisfied unless the encoder makes 
use of at least I(X;Z) random bits, where I(X;Z) is the mutual 
information between the input and output of Eve's channel [27] . 

V. Weak Security 

In this section, we prove that the coding scheme of the previous 
section satisfies the reliability and security conditions ((TJ and (fJJ 
while its rate k/n approaches the secrecy capacity. 

The reliability of this coding scheme follows immediately 
from Proposition[3] Let V denote the random vector at the out- 
put of the successive cancellation decoder for C n (A U TZ) in- 
voked by our decoding algorithm, and note that the correspond- 
ing probability of block error is upper bounded by ( TToT i. Since 
Q = V A and A U TZ = G„(W*, jS) by design, we see that 



Pr{U^U} < £Z(Wf) < 2~" P 
ieAUTZ 



(30) 



Since n ^ k, this clearly implies that lim^ooPrj U 7^ 17} = 
as required in ([T), and the reliability condition is satisfied. 

We now turn to the proof of security. For the remainder of 
this section, the sets TZ, A, B are given by d26| i - ( 1281 1, 17 de- 
notes Alice's message, V denotes the intermediate vector con- 
structed by our encoding algorithm (with V A = 17, Vjg = 0, and 
V% uniform over {0,1}'), and Z denotes Eve's observations. 
Also I A I — k, I TZ I =r, and ^2 ( ' ) i § tne binary entropy function. 
Lemma 6. 



H(V n \Z,V A ) ^ h 2 (2~ 



rT 



Proof. Suppose that in addition to her observations Z, a ge- 
nie reveals to Eve the realization v A of V A and asks her to pro- 
duce an estimate of Vr, Eve also knows that Vg = 0. Thus she 
knows all the bits of Vrc, Since TZ = Q n (W, j6), this is precise- 
ly the scenario considered in (|24| >. Consequently, Eve can use 
successive cancellation decoding to deterministic ally compute 
an estimate Vji = f(Z, V A ) such that 

A d ^ f Pr{V TC ^ V K } ^ J2 Z ( W i) < 2 ^ ( 31 > 

We now invoke Fano's inequality [1 1, p. 38] to bound the con- 
ditional entropy H(Vji\Z, V A ) in terms of A as follows 



H(V n \Z,V A ) ^ h 2 (\) +r\ 



(32) 



where we have also used the fact that V-r takes values in the set 
{0, 1}' of size 2 r . The lemma now follows from (|31~1 >. ( l32l . and 
the fact that h 2 (•) is increasing on the interval [0, V 2 ] ■ | 

Let us define e„ = C(W) - \H\/n. Since TZ = Q n (W,p), 
Theorem[T] implies that lim H ^oo e n = 0. 
Lemma 7. 



I(U;Z) < ne„ + /i 2 (2- 



n'S 



■]fc)2- 



V 



(33) 



Proof. The lemma is proved via a long sequence of simple 
equalities and inequalities, as follows: 

I(U;Z) = I(V A ;Z) = l(V AuB ;Z) (34) 

= I(V;Z)-I{Vk;Z\V AuB ) (35) 



= I(V;Z)-I(V K ;Z\V A ) 

= I(V;Z) - H(V n \V A ) + H(V n \Z,V A ) 

= I(V;Z) - H{V n ) + H{V K \Z, V A ) 

= I(V;Z)-r + H(V n \Z,V A ) 

^nC(W)-r + H(V n \Z,V A ) 

= ne n + H{V n \Z,V A ) 



^ne n + h 2 (2 



-n?) 



rl~ 



(36) 
(37) 
(38) 
(39) 
(40) 
(41) 
(42) 



^ne n +h 2 (2-" P ) + {n-k)2-" P (43) 

The equalities in (134-b hold since V A = U and Vg = 0. Now ob- 
serve that A U B and TZ are complements of each other in [n] . 
Hence, any distribution of V can be thought of as a joint distri- 
bution of V-ji and V/iuS- Given this observation, (l35l l follows 
from the chain rule for mutual information. The equality in ( l36b 
is trivial from Vg = 0, while d37] i is the definition of conditio- 
nal mutual information. The equalities (f38l > and d39l hold since 
V-ji and V A are independent, and V% is a priori uniform over 
{0, 1 }' . Inequality d40b is immediate from the fact that C(W) is 
the capacity of W, while d4lT> follows from the definition of e n . 
Finally, d42l follows from Lemma|6]and d43l is trivial. | 
Theorem 8. The encoding algorithm of the previous section sat- 
isfies the weak security condition (O, namely 

I(U;Z) 



lim 







/c-S-co k 

Proof. This follows immediately from Lemma|7] Divide both 
sides of (l33l l by n to get 



I(U;Z) 



< e„ + 



/ I (2-" p 



+ 



(44) 



It is clear that the last two terms in (l44l > tend to zero as n — »■ 00, 
and lim„^oo e M = by Theorem[T] Along with the obvious fact 
that k = Q(n), this completes the proof of the theorem. | 

Recall that for the wiretap-channel systems considered in this 
paper, the secrecy capacity is C s = C(W*) — C(W) (cf. Sec- 
tionHII). Let R n — k/n denote the rate of our coding scheme. 



Theorem 9. 



limR„ = C(W*)-C(W) 



Proof Observe that 

Rn A 



\Gn(W*,p)\ |0«(W,j8)| 



(45) 
n n n 

where we have used the definition of A in (f2Tb and the fact that 

G n ( W, j3) is a subset of Q n (W*, jS) for all ,6 < V 2 . The theorem 

now follows from Theorem[T] | 

Theorem|9] does not directly imply that our coding scheme 
achieves secrecy capacity, because the rate of communication 
from Alice to Bob, measured in information bits per channel use, 
could be much less than k/n when H(U) < k. But this is true 
for any encoder that converts k input bits to n coded output bits. 
If the encoder accommodates an arbitrary distribution on its in- 
put U, it can achieve capacity only when H(U) = k(l — o(l)). 
If this necessary condition is satisfied, then our coding scheme 
does achieve secrecy capacity by Theorem[9] 



MAHDAV1FAR and VARDY: ACHIEVING THE SECRECY CAPACITY OF WIRETAP CHANNELS USING POLAR CODES 



VI. Strong Security 

This section shows how polar coding could be used to provide 
strong security whenever the main channel C\ and the wiretap 
channel Ci are symmetric binary-input DMCs, and C2 is de- 
graded with respect to C\ . Specifically, we describe a polar cod- 
ing scheme that satisfies the strong security condition (01, while 
operating at a rate k/n that approaches the secrecy capacity. 

First, we will introduce a subtle but important change in the 
coding scheme of Section HVl (see SectionlVTl-B). In order to 
show that this change suffices to guarantee strong security, we 
need to replace the proof in the previous section by a more in- 
tricate argument (see Section [VH D). This argument relies cru- 
cially on the fact that a certain composite channel induced by 
our construction is symmetric (SectionlVH-C). Aided by a recent 
result of Hassani and Urbanke [16], we then prove that the rate 
of the proposed coding scheme approaches the secrecy capacity 
(SectionlVll-E). Unfortunately, we can show that the reliability 
condition (Q]i is satisfied only for the case where the main chan- 
nel is noiseless. Nevertheless, we believe that, in practice, low 
probabilities of block error can be achieved also when the main 
channel is not noiseless, using a modification of Ankan's suc- 
cessive cancellation decoder (Section [Vll F). 

A. Analysis of the Weak-Security Coding Scheme 

Henceforth, let us refer to the coding scheme introduced in Sec- 
tion[IV]and analyzed in the previous section as the weak-secu- 
rity coding scheme. A natural question is whether this coding 
scheme does, in fact, provide strong security. Although we do 
not have a definitive answer to this question, we conjecture that 
it does not. As before, let 

m = c(w) _ iwfli 

n n 



def 



C(W) 



(46) 



with the sets 1Z and Q n (W, /3) defined as in d26l l and (fl3T l. The 
following result provides some evidence for this conjecture. 

Proposition 10. Whenever the wiretap channel is a binary-input 
symmetric DMC and the main channel is noiseless, the weak- 
security coding scheme achieves strong security if and only if 



lim ne n = 



(47) 



Proof. The fact that d47b is sufficient for strong security is ob- 
vious from Lemma|7l since the last two terms in d33l tend to 
zero exponentially fast. In fact, it is clear that d4"7T i is sufficient 
for strong security, whether the main channel is noiseless or not. 
We now show that this condition is also necessary, at least in the 
case where the main channel is noiseless. In this case, the set B 
in d28l i is empty, and the vector V at the input to the transfor- 
mation © consists of V4 = U and V-ji, with V-ji being uniform 
over {0, 1}' . Hence, if the message U is a priori uniform over 
{0, 1}*, then V is uniform over {0, 1}" Let X = VG„ as in ©. 
Since the Ankan transform matrix G n is nonsingular, X is also 
uniform over {0, 1}". Consequently, we have 

I(V;Z) = I(X;Z) = £7(X,-,Z>) = nC(W) (48) 

where the second equality follows by noting that X\, Xi, . . . ,X„ 
are i.i.d. Ber(V 2 ) random variables and W is memoryless, while 



the last equality follows from the fact that W is symmetric. 
This implies that the inequality d40t in the proof of Lemma|7] 
becomes an equality in this case. Therefore 



I(U;Z) = ne n + H(V K \Z,V A ) > ne n 



(49) 



It is now clear that lim n ^oo ne n = is necessary for the mutual 
information I(U;Z) to vanish asymptotically. | 

Given a BSM channel W, is it true that lim^^oo ne n = for 
this channel? Unfortunately, the answer to this question is nega- 
tive. It is known [34,36] that for any discrete memoryless chan- 
nel W and any code of length n and rate R that achieves error- 
probability P e on W, we have 



const(P e/ W) _ (logn 



C(W) 



where the constant (which is given explicitly in [34]) depends 
on W and P e , but not on n. This implies that ne n = Ci{y/n), 
and the weak-security coding scheme does not provide strong 
security. Consequently, in order to provide strong security, the 
polar coding scheme of SectionlTVlhas to be modified. 



B. Strong-Security Coding Scheme 

Intuitively, the main reason that the coding scheme of Section llVI 
fails to provide strong security is this: the bit-channels that are 
deemed bad for Eve are not bad enough. Indeed, according to 
the definition of B n (W, /3) in (fT4l) . a bit-channel W,- is consid- 
ered bad for Eve whenever Z(W,) ^ 2~ n In. For example, if 
n = 2 10 and f> = 0.499, a bit-channel may be declared bad for 
Eve even when its capacity is greater than 1 — 10~ 9 while Eve's 
probability of error on this channel is less than2T0 -10 . It is ob- 
vious that such a bit-channel does not prevent Eve from deduc- 
ing the information at its input with high probability. 

The problem is that the generally accepted definitions of good 
and bad bit-channels — for example, ( TT31 and ( fT4b — are moti- 
vated by Bob 's point of view. First, a criterion for "goodness" is 
established, motivated by the probability of error on Bob's side, 
then the bit-channels that do not satisfy this criterion are deemed 
bad. In order to achieve strong security, we will re-define things 
from Eve 's point of view. First, we introduce a strong criterion 
for "badness," and then make sure that random bits are sent over 
all the bit-channels that do not satisfy this criterion. Specifically, 
given a BSM channel W and a positive 5 < 1, we define the 
index set of 5-poor bit-channels as follows: 



V„(W,S) = [ie[n] : C(Wi)^s} 



(50) 



Further, we leave the definition of the good bit-channels in ( fT3b 
unchanged, but re-define the sets 1Z, A, and B as follows: 



Tl d M [n]\7>„(WA) 

A d =-p n (w,$n)ng n {w*,p) 
B d =p n {w,s n )\g n (w*,p) 



(51) 

(52) 
(53) 



where, for the time being, S n is an arbitrary function from the 
positive integers to the interval (0, 1). We will specify this func- 
tion precisely later in this section (cf. Theorem[T7land Proposi- 
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Figure 2. Informal sketch of the strong-security coding scheme 



tionl20b. In the meantime, notice that the sets 71, A, and B, as 
defined ind5H- ( l53l l. are still disjoint and 71 U A U B = [n]. 

The strong-security coding scheme of (IBlT l - ( 1531 is illustrat- 
ed schematically in Figure 2, wherein the bold square represents 
the set of n bit-channels. This set is partitioned two ways: into 
bit-channels that are good for Bob and bad for Bob (as before), 
and into bit-channels that are <5„-poor and not <5„-poor for Eve. 
The sets X and y, with X U y = 7Z, will be discussed later in 
this section (see d94l i. d95l ) and Section lVT-Fl i. 

With the new definitions of the sets 71, A, B in (IBH - dBlll. 
the encoding and decoding algorithms for the strong-security 
coding scheme are exactly those given in Section|lV] for the 
weak-security coding scheme (although we will modify the de- 
coding algorithm somewhat in Section lVI-Ft . 



C. The Induced Channel Is Symmetric 

We prove in the next subsection that the strong-security cod- 
ing scheme indeed provides strong security. In order to do so, 
we first introduce and study a certain composite channel in- 
duced by our construction. Informally, this channel describes 
the transformation in (0 in the case where some r of the bits 
Vy, V%, ■ ■ ■ ,V n are set independently and uniformly at random, 
while the remaining n — r bits serve as the input to the chan- 
nel. This situation is depicted in Figure 3. Formally, the induced 
channel Q n (W, TV) is specified in terms of an arbitrary BSM 
channel W with output alphabet 3f, and a subset 7Z of [n] of size 
\Tl\=r. The input alphabet of Q„ ( W, TV) is {0, l}"~ r and its 
output alphabet is 3f n . To describe the transition probabilities 
of Qn(W,7?.), let us introduce the following notation. Hence- 
forth, given a vector x £ {0, l}"~ r and a vector e £ {0, l} r , let 
[x; e) denote the vector v £ {0, 1}" with v% = e and v-r? = x. 
With this, referring to the definition of W" in ||9), the 2' w x \3f\ n 
transition-probability matrix Q of Q n ( W, TV) is given by 

1 

lx;e) 



del 



Q(z\x)^ ^^W n (z\(x;e)G n ) (54) 

ee{0,l}'' 

for all x £ {0, l}"~'and all z 6 2f n . It can be readily seen that 
if V = {V\, V 2 , . . . , V„- r ) is the random vector at the input to 



the channel in Figure 3 and Z = (Zy, Z2, . 
vector at the channel output, then indeed 



. , Z„ ) is the random 



Q(z\x) = Pr{Z = z| V = x} 



(55) 



Our main goal in this subsection is to show that the induced 
channel Qn{W,7V) in Figure 3 is symmetric. This follows as a 
special case of the more general results in [5]. Although [5] pre- 
cedes this paper chronologically, it is not yet publicly available. 
Therefore, we include a complete proof for completeness. 

Recall that a group action of an abelian group A on a set W 
is a function from A x <ty to W, denoted (a, y) 1— > a.y, with the 
following properties: 

PI. 0.1/ = y for all y G *3f , where is the identity of A; 
P2. (a + b).y = a.(b.y) for all a, b 6 A and all y G W, 
where + denotes the group operation. 

The orbit of y £ W is the set of all points of W to which y can 
be moved by the elements of A. Explicitly, the orbit ofy is 



def 



0{y) = {a.y : a £ A} 



(56) 



It is well known that orbits of points in W form a partition of & 
into equivalence classes (under the equivalence relation y\ ~ 1/2 
iff there exists an a £ A with yi — a.y{). 

Theorem 11. Let ( Jf, W, W) be a DMC, and suppose that 5£ 
is an abelian group under the binary operation +. Further, sup- 
pose that there exists a group action . of S£ on W such that 



W(y\a + x) = W(a.y\x) 



(57) 



for alla,x^SC and all y £ W. Then the DMC ( 3C , W, W) is 
necessarily a symmetric channel. 

Proof. We partition the set <ty into orbits formed by the group 
action . of 9C. Let O be an orbit, and let M be the \3C\ x \0\ 
column submatrix of W consisting of those columns that are 
indexed by the elements of O. It would suffice to prove that M 
is a strongly symmetric matrix, regardless of the choice of O. 

Consider two arbitrary rows of M indexed, say, by the ele- 
ments Xj £ JT and %i £ S£. Set a = Xi + (— X\), where —X\ is 
the inverse of x\ in the group ££ . Then we have Xi = a + x\. 
Therefore, d57] > implies that 



W(y\x 2 ) = W(a.y\xi) foralli/£^ 



(58) 



It is easy to see from ( 1561 ) and property P2 that the map y 1— > a.y 
is bijective on O. Together with d58l ), this shows that the rows 
of M, indexed by X\ and x 2 , are permutations of each other. 

Now consider two arbitrary columns of M indexed by the el- 
ements y\ £ O and y 2 E O. Then, by the definition of an orbit, 
there exists an a £ i?Tsuch that y 2 = a.y\, and ( T57T > implies that 

W(y 2 \x) = W( yi \a + x) forallx£jT (59) 

It is clear that the map x 1— > a + x is bijective on X ' . Thus J59l ) 
shows that the columns of M are permutations of each other. | 

Notice that the input alphabet {0, l}"~ r of Q n (W, 71) is an 
abelian group, with the group operation + being the compo- 
nentwise modulo 2 addition of vectors in F 2 " - ' . Consequently, 
in order to prove that the channel Q n (W,TZ) is symmetric, it 
would suffice to construct a group action of {0, l} n ~ r on 3?" 
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Figure 3. Block diagram of the induced channel Q n (W, TV) 

that satisfies j57l ). To do so, we will use the fact that W itself is 
symmetric. As noted by Ankan [3], a binary-input channel W 
with output alphabet 2f is symmetric if and only if there exists 
a permutation 7Ti on 3? such that 

TTj = Tt7 (7ii is an involution) (60) 

W(z\0) = W(tti(z)|1) forallze^ (61) 

Let 7Tq be the identity permutation on iF. Following Ankan [3], 
let us define a group action of the additive group of F2 = {0, 1} 
on the set 3f as follows: x.z = n x {z) for all 16F2 and z£f. 
It is trivial to verify that x.z has the required group-action prop- 
erties PI and P2, and that for all a, x E F2 and 2 6 2£ ', we have 



W(z\a- 



W(a.z\x) 



(62) 



As in [3], we can extend this function componentwise to a group 
action of the additive group of F 2 " on the set 3f n as follows: 



def 



x.z = (xi.z-[, X 2 .Z2, ■ ■ .,x„.z n ) 



(63) 



The following lemma was proved by Ankan in [3, Propositions 
12 and 13]. We provide a simple proof herein, for completeness. 

Lemma 12. Let G n = P n G® m be the Ankan transform matrix. 
Then for all a,xE F 2 " and all z E 2? n , we have 



W n (z\(a + x)G n ) = W n (aG„.z\xG n ) 



(64) 



Proof. In fact, the lemma is true for an arbitrary n x n binary 
matrix (in other words, any linear transformation from F 2 n to it- 
self). First, let us show that 



W n (z\b + c) = W\b.z\c) 



(65) 



for all b,cE F 2 " and all z£f". This follows directly from the 
definition of W n . Indeed, expanding both sides of (l65l l as 

jjWizilbi + Ci) = nw(M,-|c,) 



!=1 



!=1 



we conclude that d65l > is implied by (l62l . Let us now set b = aG n 
and c — xG n . Then the lemma follows from d65l l along with the 
fact that multiplication by a matrix is a linear operation, that is 
(« + x)G n = aG n + xG n — b + c. | 



We now depart from Ankan [3], and introduce a group action 
o of the additive group of F 2 "~' on 3f n , defined as follows. Re- 
call that (x, e) denotes the vector v 6 {0, 1}" with v-ji — e and 
v n c = x . With this, we define for all x E F 2 "- r and all z£f" 

def,. _- (66) 



xo z = (x;0)G n .z 



where the group action . on the right-hand side is the one defin- 
ed in d63l . Again, it is easy to verify that d66l l satisfies PI and P2. 
Our main result in this subsection is the following proposition. 

Proposition 13. The induced channel Q n (W, TZ) is symmetric. 

Proof. In light of TheoremfTTl it would suffice to prove that 
the transition-probability matrix Q defined in d54l > satisfies 

Q(z\a + x) = Q(aoz\x) 

for all a,xE F 2 " _r andz E3f n . Expanding the vector (a + x;e) 
as (a; 0) + (x; e), and substituting in ( f54b . we obtain 



Q(z\a + x) = - ^W"(z|((fl;0) + (x; C ))G rz ) 
ee{0,l} r 

= h E W"((a;0)G„.z|(x; e )G„) 



ee{0,l} r 



(67) 
(68) 
(69) 



= ^ E w "(« oz K^ e ) G ") 

£6{0,1}'' 

where d68l follows from Lemma[T2land J69l follows from d66l l 
But (|69l is precisely Q(a o z \ x) by ( f54b . and we are done. | 



D. Proof of Strong Security 

Let us begin with a simple lemma that relates the capacity of the 
bit-channels in ( fTTT ) to the transformation in (0. Although this 
lemma is well-known, we include a proof for completeness. 

Lemma 14. Let W be an arbitrary BSM channel. Suppose the 
vector V = (V\, V2, . . . ,V n ) at the input to the transformation 
in (O is uniform over {0, 1}", and let Z = (Z\, 7,2, . . . ,Z n ) be 
the random vector at the output of the transformation. Further, 
let W\, VV2, . . . , W„ be the corresponding bit-channels, defined 
in < TT~TT > - Then for all i E [n], the capacity of W( is given by 



cm: 



I(V i ;Z,V 1/ V 2 ,...,V l _ 1 ) 



Proof. Let V, denote the random vector (V\, V2, ■ ■ ■ , V{) for 
all i E [n], as before. It is shown in [3] that if W is symmetric, 
then so is W; for all i E [n] . Hence, the capacity of W; is the 
mutual information between its input and output when the input 
is uniform over {0, 1}. Thus it would suffice to show that 



Wi(z,v\x) = Pr{Z = z,V,_ 1 = v\ V { = x} 



(70) 



for all zE^ n ,vE {0, l}'" 1 , and x E {0, 1}. Since V,- is uniform 
we can re-write the right-hand side of ( TTOb as follows: 



Pr{Z = 2, V { _i = v, Vi = x} 
Pr{^- = x) 



= 2Pr{Z = z,Vj = {v,x)} 



Observe that the event { V,- = (v, x) } is the union of 2" 1 dis- 
joint events {V = (v,X,v)\, as Granges over {0, l} n_I (or v 
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is the empty string, if i = n). Since V is uniform over {0, 1}", 
the probability of each such event is 2~ n . Consequently 

2Pr{Z = z,Vi = (v,x)} = 



2 J^ ?r{Z = z,V = {v,x,v)} 

ve{0,l}"-' 



7^-T E Pr{Z 
yn— 1 i—i L 

zJe{0,l}"-" 



(71) 

z\V = {v,x,v)} (72) 



Since Z and V are, respectively, the output and the input to the 
transformation in ©, for all if £ {0, 1}" we have 



Pr{Z 



IV 



} = W n (z\wP n G 



8>m\ 



W(z|w) 



by the definition of the "combined" channel W in dT0T >. Together 
with d72l and the definition of W/ in (fTTT l. this shows that the 
right-hand side of ( TTOb is indeed equal to W, ■ (z, z; | x) . | 

The next lemma combines Proposition[T3lwith Lemma[T4lto 
upper-bound the capacity of the induced channel Q n (W, TV) ■ 

Lemma 15. Let W be an arbitrary BSM channel. For n = 2 m , 
let Wi, W2, ■ ■ ■ , Wn denote the corresponding bit-channels. Then 
for all 1Z C [n], the capacity of the induced channel Q n {W, TV) , 
defined in d54l >. is upper-bounded as follows: 



c(Qn(w,n)) < E c ( w 

ieTZ c 



(73) 



Proof. Consider again the transformation in (|5), with the input 
vector V— (V\, V%, . . . ,V n ) being uniform over {0,1}", and let 
Z = (Z\, Z2, . . .,Z n ) denote the output of the transformation, 
as in LemmafPfl Then 



C(Q„{W,K)) = I(V K ,;Z) 



(74) 



This is so because Q n {W,lZ) is symmetric by PropositionfTTl 
and the capacity of a symmetric DMC is given by the mutual 
information between its input and output, under uniform input 
distribution [14, Theorem4.5.2]. Next, write TV = [n]\lZ as 

TV = \i\,iz>- ■ -Jn-rj 

where r = \1Z\, and assume w.l.o.g. that i\ < ii < — < i n -r- 
The lemma can be now proved via a sequence of simple equal- 
ities and inequalities, as follows: 



I(V K .;Z) = I(V ll ,V 11 ,...,V in _ r ;Z) 

= n fi(v ir ,z\v h ,v l2 V { ) 

= n f,lty;Z,V ilf V k Vi ) 



(75) 
(76) 

(77) 

(78) 



< E / (^ / ;Z ' yi ' y2 '---'V 1 ) 

The equality J76] > is the chain rule for mutual information. The 
equality dTTJl follows from the fact thatI(X;Z|Y) = I(X;Z,Y) 
for all random variables X, Y, Z such that X and Y are indepen- 
dent. To establish d78l >. we adjoin to the set of random variables 
{Vi lf Vi 2 , . . . ,Vi ._ } its complement in the set {Vi,V2/ ■ • •/^•}- 



Clearly, this cannot decrease the mutual information. Lastly, we 
observe that the summation in (|78l is equal to the summation 
on the right-hand side of d73l by Lemma[T4l | 

With Lemma[T5lin hand, we are finally ready to establish our 
main result in this section. 

Proposition 16. Let U be the message at the input to the enco- 
der for the strong- security coding scheme. Then, regardless of 
the a priori distribution of U, we have 



I(U;Z) < S n \Vn(W,S n ) 



(79) 



where Z is the output of the wiretap channel, and V n ( W, S n ) is 
the index set ofS n -poor bit-channels, as defined in d50t . 

Proof. Recall that our encoder first constructs the vector V 
with V4 = If , Vg = 0, and V-ji uniform over {0, 1}' . Hence 

I(U;Z) = I(V A ;Z) = I(V AuB ;Z) 

Since the sets A, B, 1Z partition [n] , the vector V4 u b can be 
regarded as the input to the induced channel Q n (W, TV) . More- 
over, since TV = Vn(W,6 n ) by (IBTl l. Lemma[T5limplies that 

I(V AUB ;Z) < C(Q n {W,Tl)) ^ Y, c ( w <) 

iSP„{W,&„) 

The proposition now follows by observing that C(Wj) $J 5 n for 
all i E V n (W, S n ), by the definition of V„(W, S n ) in (ED). | 

Note that we are still free to specify the function 5 n in ( BIT ) 
and d79b . This means that the security of our coding scheme 
is tunable. Let us henceforth refer to 5 n as the security func- 
tion; this function is a design parameter in our scheme. Propo- 
sition[l6]implies that choosing different settings for the security 
function guarantees different levels of security. 

Theorem 17. For any security function such that 6 n — o(l/n), 
the strong- security coding scheme guarantees strong security. 

Proof. Follows from Proposition[T6l along with the definition 
of strong security in (0) and the fact that | Vn ( W, S n ) \ ^ n . \ 

In fact, we shall see in the next subsection (cf. Theorem[2TI) 
that we can achieve the secrecy capacity, while setting the secu- 
rity function to be as small as 5 n = 2~" for any positive con- 
stant /5 < V 2 - In this case, our coding scheme guarantees that the 
mutual information between the message U and Eve's observa- 
tions Z scales roughly as 

1{U;Z) = o(VvW^) (80) 

for any £ > 0. Note that this holds regardless of the a priori dis- 
tribution of the message. 



E. Rate of the Strong-Security Coding Scheme 

Let R„ = k/n = \A\/n denote the rate of the strong-security 
coding scheme, where A is the set defined in d52l . Note that 

A = P n (W,Sn)\B n (W*,p) 

since the sets Q n (W*, /?) and B n (W*, j3) of good and bad chan- 
nels in ( fT3] l. ( Tl4b are complements of each other. It follows that 



Rn > 



\V„(W,6„)\ \B n (W*,p)\ 



(81) 
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The asymptotic behavior of the fraction \B n (W*, /5) | In is given 
by TheoremQ] Therefore, in order to prove that the rate of the 
strong-security coding scheme approaches secrecy capacity, it 
remains to analyze the asymptotic behavior of \V n ( W, S n )\/n. 
In this regard, a recent result of Hassani and Urbanke [16] 
will be useful. To describe this result, we need to introduce some 
notation. Given a BSM channel W and a positive 7 < 1, let 



» = {i6[»] : Z(Wi)>l- 7 } 



(82) 



This is similar to the definition of the set V n (W, 8) in d50i l, ex- 
cept that (l82l uses the Bhattacharyya parameters Z(W,) instead 
of the channel capacities C(Wj). Given a positive integer m and 
a real number £ in the open interval (0, 1), let a = a(m, f ) de- 
note the unique positive integer such that 



m 
def 



< £2 m < £ 

i'=a— 1 



(83) 



and define cc(m, £) = aim, £)/m. Further, we say that a func- 
tion f(m) from the positive integers to the interval (0, 1) is in 
the intersection of o(l/\/m) and cu(l/m) if 



lim ( 



tnf(m)) = and lim (tnf(m)) 



The following is (the second part of) Theorem 3 of Hassani and 
Urbanke [16]. Although this result is more general than what 
we need, we state it below exactly as in [16, Theorem3]. 

Theorem 18. Let W be a BSM channel, and let f < 1 be a posi- 
tive constant. Fix an arbitrary function f(m) in the intersection 
ofo(l/y/m) and co(l/m), and for all n = 2'" define 

7n d l f 2 -^»"^+/<».>) (g4) 

where tx.(m, £) is the function defined in ( f83l >. Then the asymp- 
totic behavior of the fraction \P' n (W, j n )\/n is given by 

\K(w, 7n )\ 



lim 

n— >oo 



= £(l-C(W)) 



(85) 



In Corollary[19]and Proposition[20] we specialize the general 
result of TheoremfTSIto our needs. 

Corollary 19. Let W be an arbitrary BSM channel. Then for 
any positive constant f> < V 2 we have 



lim 

11— >oo 



\-p'JW,2~ n 



l-C(W) 



(86) 



Proof. Applying the Stirling formula to both sides of 
can be shown (cf. [16]) that 



a(m,£) 



m 

~2 



Q-'iO 



ni 



o(yfm) 



where Q(x) is the probability that a standard normal random 
variable will obtain a value larger than x. Consequently, for all 
positive £ < 1, there exists a positive constant c^ such that 

for all sufficiently large m. This implies that for any positive 
constant /3 < V 2 , any function f(m) from the positive integers 



to the interval (0, 1), and for all sufficiently large m, the follow- 
ing inequality holds 



«K£) (! + /(«)) >^^i 



H 



> 



m 



This, in turn, implies that for all sufficiently large n = 2 m , we 
have 7„ < 2~" where j n is the function defined in d84b . There- 
fore, the set VniWfjn) is a subset of the set V'^vM,!-^) for 
all sufficiently large n. 

Let us assume for a moment that the limit on the left-hand 
side of d86l l exists, call it L. If so, we can conclude from Theo- 
rem[T8l along with the fact that V' n {W, 7,, ) C V' n ( W, 2"'^ ) for 
all sufficiently large n, that 



lim 

n— >oo 



V^(W,2-" P 



^ lim 

n— >oo 



P»(W,7„)| 



f(l-C(W)) 



Since this holds for all positive ^ < 1, the lowest possible value 
of L is 1 - C(W). Now observe that the set V' n {W ,2~ n? ) and 
the set GniyVifi) of good channels, defined in ( fT3l ), do not in- 
tersect. Therefore, for all n = 2 m we have 



|p;,(w,2-" p 



< 1 



\Gn(W,(l)\ 



n n 

Together with Theorem[Tj this implies that the limit L indeed ex- 
ists, and is equal to 1 — C(W). | 

Proposition 20. Let W be an arbitrary BSM channel, and let 
j6 < V 2 be a positive constant. Further, let V n (W, 8 n ) be the in- 
dex set ofS n -poor bit-channels, as defined in d50l l. and suppose 
that there exist positive constants c\ and Ci such that 



Cl 2-" P < 8 n < 1 - c 2 



(87) 



for all sufficiently large n. Then the asymptotic behavior of the 
fraction ^^(W, 8 n ) \ In is given by 



lim 

n— >oo 



\r n (w,8 n )\ 



l-C(W) 



(88) 



Proof. Let Wi, W2, . . . , W„ denote the bit-channels, as be- 
fore. It was shown by Ankan in [3, Proposition 1] that 



C(W { ) < \/l-Z(W,-) 2 



(89) 



for all i G [w] . Fix a positive constant a such that j6 < a < V 2 - 
SinceZ(Wj) ^ 1 -2""" for all i eV{ l (W,2- n *)by definition, 
Ankan's bound d89l implies that 



C(W t ) ^ 2-("'- 1 ) /2 



(90) 



for i€.Vh(W,2 n ). For all sufficiently large n, the right-hand 
side of (|90l > is less than the left-hand side of d87l >. and therefore 



V' n {W,2-"*) C P„(W,J„) 



(91) 



Also note that the condition <5„ ^ 1 — c 2 on the right-hand side 
of (|87| | implies that for all sufficiently large «, we have 



Pn(w,6 n )ng„(w,p) = 



(92) 



The proposition now follows by combining (|9T1 i and (l92l with 
Corollary[l9]and Theorem[Tj respectively. | 
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We are now ready to prove that for a wide range of security 
functions, our strong-security coding scheme operates at a rate 
that approaches the secrecy capacity. 

Theorem21. For any security function 5 n that satisfies d87l ), the 
rate R n of the corresponding strong- security coding scheme ap- 
proaches the secrecy capacity, namely 



limR n = C(W*)-C(W) 



(93) 



Proof. A lower bound on the rate R n is given in (T8TT >. Along 
with Propositionl20land Theorem[U this immediately shows that 
lirrin-^oo R n ^ C(W*) — C(W). In order to establish equality 
in (|93]l, let us partition the set TZ = [n]\ V n ( W, S„) in (H]} into 
two subsets, as in Figure 2. These subsets are defined as follows: 



X = f TZnB„(W*,ft) 

y = izng„(w*,ft 



(94) 
(95) 



Note that the set X in d94l ) and the set B in (153) ) form a partition 
of B n ( V\f*, /3), as illustrated in Figure 2. It follows that 



Rn 



\V n (W,S n )\ \B n (W*,ft\ , \X\ 



(96) 



n n n 

We will show in Propositionl22lof the next subsection that the 
fraction \X\/n vanishes as n — »■ oo. With this, the theorem fol- 
lows from J96l ), Propositionl20l and TheoremQ] | 



F. Reliability of the Strong-Security Coding Scheme 

If the main channel W* is noiseless, Bob can trivially recover 
the message with probability 1 as follows. Bob receives the vec- 
tor Y = VG n - Since the Arikan transform matrix G n = P n G® m 
is its own inverse over F2, Bob can compute V = YG n and set 
II = V4. Clearly 17 = 17, and condition (Q]) is trivially satisfied. 
What happens if the main channel W* is not noiseless? Sup- 
pose that Bob attempts to use the successive cancellation deco- 
der, as in SectionlVl Then, according to TheoremfJ] and Propo- 
sition[3j Bob's probability of error is upper-bounded by the sum 
of the Bhattacharyy a parameters Z(VV*) of those bit-channels 
that are not fixed (to zero). The index set of the bit-channels that 
are not fixed in our strong-security coding scheme is given by 

autz = g„{w*,p)ux 

where X is the set defined in J94l ). The sum of the Bhattacharya 
parameters Z(W*) over the set Q n ( W*, ft is bounded by 2~ nP 
by definition, and therefore 



nP 



Pr{l7^l7} < 2- nP + £Z(W?) 



(97) 



ieX 



Unfortunately, we are not aware of any useful bounds on the 
sum YLieX Z( W? ). Thus we do not have a proof that the strong- 
security coding scheme satisfies the reliability condition (|T). 

Observe, however, that the security and reliability require- 
ments are fundamentally different. Whether we're interested in 
the theory or in the practice of wiretap channels, security al- 
ways requires a proof. It cannot be established through a com- 
putational procedure, such as simulation. On the other hand, re- 



liability can be (and often is) established through computation 
in practice. For example, it is very common to rely on simu- 
lations to verify the reliability performance of error-correcting 
codes. We believe that, in practice, our strong-security coding 
scheme can be decoded to achieve low probabilities of error. 

First, the following proposition shows that if W is degraded 
with respect to W* , then the set X that gives us trouble in suc- 
cessive cancellation decoding is small. 

Proposition 22. Let X be the index set of bit-channels that are 
not S n -poor for Eve yet bad for Bob, as defined in J94l l. Then 



lim - — - 
n—^00 n 







for any security function 5 n that satisfies d87l >. provided Eve's 
channel W is degraded with respect to Bob's channel W* . 

Proof We claim that the sets X, G„(W*, ft), and V„(W*, S„) 
are pairwise disjoint, and therefore 



|*| \Gn(W*,ft\ 

n n 



\r„{w*,s n )\ 



^ 1 



(98) 



Since X C B„(W* ft) by definition, and B„ ( W* ft) is the com- 
plement of Gn(W*,ft, it is clear that X n G„(W*,ft = 0. It is 
also clear that P„(W*,<5„) H Q n (W*,ft = 0, as in d92|. Fur- 
thermore, since X C TZ and 1Z = [n] \V n {W, $n) by definition, 
we have X nV n (W, S n ) = 0- Consequently, in order to prove 
our claim in (|98l l, it would suffice to show that 

V„(W*,S n ) C V n {W,S n ) 

But this follows immediately from Lemma|4]along with the def- 
inition of the set of i5„-poor channels in (T5Qb . Now observe that 
as n -^ oo, the fraction | Q n (W* ft) \ In converges to the capac- 
ity C(W*) by Theorem[T] whereas the fraction \V„ (W*, S„)\/n 
converges to 1 - C ( W* ) by Propositionl20l Together with (r9Bl . 
this completes the proof of the proposition. | 

Depending upon how small the set X turns out to be in prac- 
tice, various decoding solutions are potentially applicable. 

First, it is quite possible that X = in many situations, es- 
pecially if the main channel is much better than the wiretap 
channel. In this case, successive cancellation decoding can be 
used "as is," and Bob's probability of error is at most 2 ™ . 

The following example illustrates this situation. In order to 
obtain the numerical values given in this example, we have used 
the methods of [38] to evaluate the polar bit-channels. 

Example. Suppose that both the main channel and the wire- 
tap channel are binary symmetric channels, say C\ = BSC(pi) 
and Ci = BSC(p2)- Let us further assume that p\ = 10~ 3 and 
that the error-rate required at the output of the main-channel de- 
coder is 10~ 9 . This is often the case in optical fiber communi- 
cations [32]. We will use the polar transformation (O of length 
n = 2 20 . Indeed, codes of this length are already in use today 
in proprietary lOOGbE fiber-optic systems. We also adopt the 
following stringent security criterion: we require that the mu- 
tual information I(U;Z) between messages at the input to our 
encoder and observations at the output of the wiretap channel is 
less than 10 -30 . Using the methods developed in this section, 
we can simultaneously guarantee reliability of 10~ 9 and secu- 
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rity of 10 30 at communication rates close to the secrecy capa- 
city. The following table: 



P2 


RateR 


% of C s 


0.45 


0.933 


95.1% 


0.40 


0.882 


91.9% 


0.35 


0.817 


88.5% 


0.30 


0.738 


84.8% 


0.25 


0.647 


80.9% 


0.20 


0.543 


76.4% 


0.15 


0.425 


71.1% 


0.10 


0.293 


64.0% 



(99) 



summarizes these rates as a function of the bit error-rate yi of 
the wiretap channel. The third column in the table gives the ra- 
tio R/C s , expressed as a percentage. Notably, in all of these 
cases, we have X = 0. In fact, the set X remains empty un- 
til the bit error-rate on the wiretap channel decreases down to 
P2 — 0.066, in which case \X\ — 1. But the secrecy capacity 
for p2 ^ 0.066 is less than 0.3395, which is probably too small 
to be of practical interest (in fiber-optic communications), q 

Now suppose that the set X is nonempty. Observe that this 
set is fixed and known a priori to all the parties (Alice, Bob, and 
Eve). In successive cancellation decoding, Bob makes his de- 
cisions V\, z>2, ■ • ■ / V n sequentially using the decision rule ( fl~2b . 
Bob will know a priori that this decision rule is unreliable when- 
ever an index i E X is reached. Therefore, Bob could follow 
both alternatives v, = and Vj = 1 for all i E X. Doing so in- 
creases the decoding complexity by a factor of 2' ' . But if | X | 
is a small constant (say, X contains only a couple of bit-chan- 
nels), this is not unreasonable. 

What can Bob do if the set X is larger? It is well known that 
in successive cancellation decoding, a single incorrect decision 
affects all the following decisions, making them unreliable. We 
propose to take advantage of this phenomenon in order to re- 
duce the decoding complexity. Let i\ be the smallest index in X. 
Suppose that upon branching with p,- = and v^ = 1, the de- 
coder begins to compute the channel noise (e.g. the Hamming 
distance to the received vector on a BSC channel) accumulated 
along each of the two decision paths being followed. Due to 
the "error propagation" induced by the incorrect decision at i\, 
we expect this estimated noise to accumulate rapidly along the 
incorrect path. On the other hand, along the correct path, the 
channel noise should accumulate slowly, governed by the statis- 
tics of W* that are known a priori. This means that the decoder 
can detect, with high probability, which of the two paths being 
followed is incorrect. Once the decoder finds that the path with 
the higher accumulated noise is sufficiently unlikely, according 
to the channel statistics, this path can be safely discarded. 

Of course, it is possible that the second smallest index ii E X 
is reached before one of the two paths opened at i\ E X can be 
discarded. In this case, the decoder would need to begin follow- 
ing four paths. Once the third index r'3 6 X is reached, the de- 
coder might be following 1, 2, 3, or 4 paths. And so on. In prac- 
tice, one could design the decoder to follow at most M paths, 
where M is a pre-determined limit dictated by the decoder com- 
plexity considerations. The situation is quite similar to decision- 
feedback equalization on ISI channels using a bank of M zero- 



forcing DFEs. That scenario was analyzed in [43], where it is 
shown that error-propagation caused by incorrect decisions can 
be used to discard erroneous decision-feedback paths. It is also 
shown in [43] that, in practice, small values of M often suffice 
to achieve very good performance. 

We have limited our consideration herein to successive can- 
cellation decoding, or variants thereof. It is also possible that 
other methods of decoding polar codes (such as belief propaga- 
tion [18] or recursive-list decoding [13]) may be relatively ro- 
bust to not fixing a small set X of bad channels. Analysis of 
such decoders is a research problem of independent interest. 



VII. Discussion and Open Problems 

We briefly mention certain straightforward extensions of our re- 
sults. So far, we have considered exclusively binary-input sym- 
metric wiretap channels. However, it is well known that the pol- 
arization phenomenon extends to other types of channels. An- 
kan shows in [3] that, given a non-symmetric binary-input chan- 
nel W, polar codes achieve its symmetric capacity X( W) in the 
average sense. This implies that our coding scheme achieves the 
symmetric capacity difference X(W*) — X(W), also in the av- 
erage sense. Specifically, suppose we modify our encoding algo- 
rithm in Section lfVl as follows. Instead of constructing the vec- 
tor v E {0, 1}" by setting v-r, = e, vj± = u, and vg = 0, we set 
v-ji = e, va = m, and pg = s, where s is a fixed binary vector 
known a priori to all the parties. Then there exists some choice 
of s such that Theorem[8] and Theorem[9]hold. Based upon the 
results of Ankan in [3], exactly the same proof as before (Lem- 
ma|6]and Lemma|7]i applies. 

Our results also extend to discrete memoryless channels with 
non-binary input. It was recently proved in [35] that channels 
with an input alphabet of prime size q are polarized by the same 
transformation ©, and the corresponding versions of Theo- 
remQ]and Theorem|2]hold. The probability of error under suc- 
cessive cancellation decoding still scales as 0(2~" ) for all 
prime q. This means that our proof of Lemmas [6] and [7] goes 
through essentially "as is" (as long as r is replaced by r log 2 ^ 
throughout). If the size q of the input alphabet is not prime, po- 
larization requires either a randomized permutation on the input 
or multilevel coding (see [35] for more details). It can be shown 
that our results in Section|V]extend to this case as well. 

It is not clear whether the strong-security results of the previ- 
ous section can be similarly extended to non-symmetric and/or 
to non-binary-input wiretap channels. We believe they can, and 
pose a proof of this as an open problem. 

Another open problem of great interest is how to code for the 
situation where the wiretap channel W in not degraded with re- 
spect to the main channel W* . Note that channel degradation is 
sufficient but, to the best of our knowledge, not necessary for 
our coding scheme to work. What seems to be necessary is that 
the set of bit-channels that are "good" for Eve but "bad" for 
Bob is either empty (as in SectionlVTi or at least very small (as 
in Section lVll i. Unfortunately, in the general case, there is no 
reason why the number of such bit-channels could not be large. 

Finally, we point out that all the constructions in this paper 
are only as explicit as the polar codes themselves. An exact al- 
gorithm for computing the sets Q„ ( W, /5) and V n ( W, S) in ( TT3b 
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and d50l l was given by Ankan in [3] . However, this algorithm re- 
quires time and memory that grow exponentially with the code 
length n. Since then, several heuristic algorithms for this prob- 
lem have been proposed [2,30,31]. However, these algorithms 
do not provide useful guarantees on the quality of their output. 
Such guarantees are clearly essential to establish the security of 
our coding scheme. Fortunately, the problem has been resolved 
in [38]. The algorithm of [38] runs in linear time, and makes it 
possible to compute upper and lower bounds on the capacity of 
polar bit-channels with an arbitrary degree of precision. For ex- 
ample, to establish the results reported in J99] l, we have run the 
algorithm of [38] with a precision of 300 bits (which is necessa- 
ry to provide meaningful guarantees of security down to 10~ 30 ). 
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